~GitHub/inattention-populationsample/code/inattention-data-prep.Rmd
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter. Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I. When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
Organization of the data and the analysis:
Input file:
Output files (data):
# D <- read.csv(file = "../data/inattention_nomiss_2397x12.csv")
# The original SPSS file as provided to AJL is
# 'inattention_Astri_94_96_new_grades_updated.sav'
# and being edited and reduced by AJL to 'inattention_Arvid_new.sav'
# Import data stored in the SPSS format
library(memisc)
Loading required package: lattice
Attaching package: ‘lattice’
The following object is masked from ‘package:boot’:
melanoma
Loading required package: MASS
Attaching package: ‘memisc’
The following object is masked from ‘package:BBmisc’:
%nin%
The following objects are masked from ‘package:stats’:
contr.sum, contr.treatment, contrasts
The following object is masked from ‘package:base’:
as.array
# fn <- "../data/inattention_Arvid_new.sav"
fn <- "/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav"
data <- as.data.set(spss.system.file(fn))
library(foreign)
fn_age <- "/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Astri_94_96_new_grades_updated.sav"
Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
#Sys.setlocale(locale="C")
data_age <- read.spss(fn_age, to.data.frame=TRUE, use.value.labels=FALSE)
/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Astri_94_96_new_grades_updated.sav: Unrecognized record type 7, subtype 14 encountered in system file/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Astri_94_96_new_grades_updated.sav: Unrecognized record type 7, subtype 18 encountered in system file/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Astri_94_96_new_grades_updated.sav: Unrecognized record type 7, subtype 24 encountered in system filere-encoding from latin1
#names(data_age)
dim(data_age)
[1] 10870 496
age_c4 = data_age$c_4_age_at_completion
summary(age_c4)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
16.02 16.73 17.40 17.49 18.17 19.99 364
# Make new data frame from the sample with the variables
# gender, grade, SNAP1, ..., SNAP9 (vars #1-11) and
# academic_achievement (var #52)
names(data)
[1] "gender" "grade" "snap1" "snap2"
[5] "snap3" "snap4" "snap5" "snap6"
[9] "snap7" "snap8" "snap9" "snap10"
[13] "snap11" "snap12" "snap13" "snap14"
[17] "snap15" "snap16" "snap17" "snap18"
[21] "y_4_asrs_1" "y_4_asrs_2" "y_4_asrs_3" "y_4_asrs_4"
[25] "y_4_asrs_5" "y_4_asrs_6" "y_4_asrs_7" "y_4_asrs_8"
[29] "y_4_asrs_9" "y_4_asrs_10" "y_4_asrs_11" "y_4_asrs_12"
[33] "y_4_asrs_13" "y_4_asrs_14" "y_4_asrs_15" "y_4_asrs_16"
[37] "y_4_asrs_17" "y_4_asrs_18" "y_4_mfq_1" "y_4_mfq_2"
[41] "y_4_mfq_3" "y_4_mfq_4" "y_4_mfq_5" "y_4_mfq_6"
[45] "y_4_mfq_7" "y_4_mfq_8" "y_4_mfq_9" "y_4_mfq_10"
[49] "y_4_mfq_11" "y_4_mfq_12" "y_4_mfq_13" "academic_achievement"
d <- data[, c(1:11, 52)]
dim(d)
[1] 10870 12
names(d)
[1] "gender" "grade" "snap1" "snap2"
[5] "snap3" "snap4" "snap5" "snap6"
[9] "snap7" "snap8" "snap9" "academic_achievement"
str(d)
Data set with 10870 obs. of 12 variables:
$ gender : Nmnl. item w/ 2 labels for 0,1 num NA NA NA NA NA NA NA NA NA NA ...
$ grade : Itvl. item + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap1 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap2 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap3 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap4 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap5 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap6 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap7 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap8 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap9 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ academic_achievement: Itvl. item num 2.86 NA 3 3.67 4.1 ...
summary(d)
gender grade snap1 snap2 snap3
Girl:5528 Min. : 2.00 Not true :2646 Not true :2698 Not true :2810
Boy :4978 1st Qu. : 2.00 Somewhat true : 350 Somewhat true : 294 Somewhat true : 225
* : 0 Median : 3.00 Certainly true: 61 Certainly true: 65 Certainly true: 23
NAs : 364 Mean : 2.84 * : 0 * : 0 * : 0
3rd Qu. : 3.50 NAs :7813 NAs :7813 NAs :7812
Max. : 4.00
Missings: 0.00
NAs :7719.00
snap4 snap5 snap6 snap7
Not true :2806 Not true :2783 Not true :2784 Not true :2927
Somewhat true : 229 Somewhat true : 225 Somewhat true : 223 Somewhat true : 96
Certainly true: 22 Certainly true: 49 Certainly true: 49 Certainly true: 18
* : 0 * : 0 * : 0 * : 0
NAs :7813 NAs :7813 NAs :7814 NAs :7829
snap8 snap9 academic_achievement
Not true :2260 Not true :2733 Min. : 1.000
Somewhat true : 669 Somewhat true : 288 1st Qu. : 3.286
Certainly true: 127 Certainly true: 37 Median : 3.889
* : 0 * : 0 Mean : 3.824
NAs :7814 NAs :7812 3rd Qu. : 4.444
Max. : 6.000
Missings: 0.000
NAs :2204.000
dd <- d
dd$age <- age_c4
summary(dd)
gender grade snap1 snap2 snap3
Girl:5528 Min. : 2.00 Not true :2646 Not true :2698 Not true :2810
Boy :4978 1st Qu. : 2.00 Somewhat true : 350 Somewhat true : 294 Somewhat true : 225
* : 0 Median : 3.00 Certainly true: 61 Certainly true: 65 Certainly true: 23
NAs : 364 Mean : 2.84 * : 0 * : 0 * : 0
3rd Qu. : 3.50 NAs :7813 NAs :7813 NAs :7812
Max. : 4.00
Missings: 0.00
NAs :7719.00
snap4 snap5 snap6 snap7
Not true :2806 Not true :2783 Not true :2784 Not true :2927
Somewhat true : 229 Somewhat true : 225 Somewhat true : 223 Somewhat true : 96
Certainly true: 22 Certainly true: 49 Certainly true: 49 Certainly true: 18
* : 0 * : 0 * : 0 * : 0
NAs :7813 NAs :7813 NAs :7814 NAs :7829
snap8 snap9 academic_achievement age
Not true :2260 Not true :2733 Min. : 1.000 Min. :16.02
Somewhat true : 669 Somewhat true : 288 1st Qu. : 3.286 1st Qu.:16.73
Certainly true: 127 Certainly true: 37 Median : 3.889 Median :17.40
* : 0 * : 0 Mean : 3.824 Mean :17.49
NAs :7814 NAs :7812 3rd Qu. : 4.444 3rd Qu.:18.17
Max. : 6.000 Max. :19.99
Missings: 0.000 NA's :364
NAs :2204.000
# Get observations of data frame that have missing values and those with complete cases
library(psych)
d.miss <- d[!complete.cases(d),]
d.nomiss <- d[complete.cases(d),]
str(d.nomiss)
Data set with 2397 obs. of 12 variables:
$ gender : Nmnl. item w/ 2 labels for 0,1 num 0 0 0 0 0 0 0 0 0 0 ...
$ grade : Itvl. item + ms.v. num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap2 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap3 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 1 0 0 0 0 0 0 0 0 ...
$ snap4 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap5 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap6 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap7 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap8 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 1 0 ...
$ snap9 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ academic_achievement: Itvl. item num 4.67 3.67 4.14 4.11 4.3 ...
headTail(as.data.frame(d.nomiss))
summary(d.nomiss)
gender grade snap1 snap2 snap3
Girl:1256 Min. :2.000 Not true :2079 Not true :2117 Not true :2201
Boy :1141 1st Qu.:2.000 Somewhat true : 272 Somewhat true : 230 Somewhat true : 181
Median :3.000 Certainly true: 46 Certainly true: 50 Certainly true: 15
Mean :2.814
3rd Qu.:3.000
Max. :4.000
snap4 snap5 snap6 snap7
Not true :2217 Not true :2190 Not true :2195 Not true :2312
Somewhat true : 164 Somewhat true : 176 Somewhat true : 170 Somewhat true : 73
Certainly true: 16 Certainly true: 31 Certainly true: 32 Certainly true: 12
snap8 snap9 academic_achievement
Not true :1794 Not true :2142 Min. :1.000
Somewhat true : 510 Somewhat true : 228 1st Qu.:3.556
Certainly true: 93 Certainly true: 27 Median :4.083
Mean :4.023
3rd Qu.:4.556
Max. :5.900
D1 <- d.nomiss # For later use
dd.nomiss <- dd[complete.cases(dd),]
summary(dd.nomiss)
gender grade snap1 snap2 snap3
Girl:1256 Min. :2.000 Not true :2079 Not true :2117 Not true :2201
Boy :1141 1st Qu.:2.000 Somewhat true : 272 Somewhat true : 230 Somewhat true : 181
Median :3.000 Certainly true: 46 Certainly true: 50 Certainly true: 15
Mean :2.814
3rd Qu.:3.000
Max. :4.000
snap4 snap5 snap6 snap7
Not true :2217 Not true :2190 Not true :2195 Not true :2312
Somewhat true : 164 Somewhat true : 176 Somewhat true : 170 Somewhat true : 73
Certainly true: 16 Certainly true: 31 Certainly true: 32 Certainly true: 12
snap8 snap9 academic_achievement age
Not true :1794 Not true :2142 Min. :1.000 Min. :16.06
Somewhat true : 510 Somewhat true : 228 1st Qu.:3.556 1st Qu.:16.69
Certainly true: 93 Certainly true: 27 Median :4.083 Median :17.32
Mean :4.023 Mean :17.40
3rd Qu.:4.556 3rd Qu.:18.03
Max. :5.900 Max. :19.22
ss = summary(D1$gender)
tt <- with(subset(dd.nomiss, gender %in% c("Girl", "Boy")),
t.test(age ~ factor(gender)))
tt
Welch Two Sample t-test
data: age by factor(gender)
t = 2.1129, df = 2384.7, p-value = 0.03472
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.005156208 0.138277404
sample estimates:
mean in group Girl mean in group Boy
17.43525 17.36354
.. and information about gender and academic achievement when they participated in the fourth study wave - in total 2397 participants, 1256 Girls and 1141 Boys. Mean age when included in wave 4 was 17.4011148 (16.95) years and SD 0.8317614 (SD = .846), with a slightly higher mean age in girls compared to boys (p=0.0347158)
nonsignificant age-difference between girls and boys (p = .088).
summary(D1$snap1[D1$gender == "Boy"])
Not true Somewhat true Certainly true
935 176 30
summary(D1$snap1[D1$gender == "Girl"])
Not true Somewhat true Certainly true
1144 96 16
# Association Statistics
# Computes the Pearson chi-Squared test, the Likelihood Ratio chi-Squared test,
# the phi coefficient, the contingency coefficient and Cramer's V for possibly
# stratified contingency tables.
library(vcd)
Loading required package: grid
Attaching package: ‘grid’
The following object is masked from ‘package:BBmisc’:
explode
tab <- xtabs(~gender + grade, data = D1)
summary(assocstats(tab))
Call: xtabs(formula = ~gender + grade, data = D1)
Number of cases in table: 2397
Number of factors: 2
Test for independence of all factors:
Chisq = 8.373, df = 2, p-value = 0.0152
X^2 df P(> X^2)
Likelihood Ratio 8.3991 2 0.015002
Pearson 8.3731 2 0.015199
Phi-Coefficient : NA
Contingency Coeff.: 0.059
Cramer's V : 0.059
# Save the nomiss D to an .csv file without row names for further analysis
D <- d.nomiss
write.csv(D, file = "../data/inattention_nomiss_2397x12.csv",row.names=FALSE)
# For simplicity, we rename (and translate) the variables names in the dataset D without any missing
library(plyr)
Attaching package: ‘plyr’
The following object is masked from ‘package:memisc’:
rename
d.nomiss <- read.csv(file = "../data/inattention_nomiss_2397x12.csv")
D <- d.nomiss
D <- rename(D, c(academic_achievement="ave"))
D$ave <- as.numeric(D$ave)
D$snap1 <- mapvalues(as.factor(D$snap1), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap1 <- as.numeric(D$snap1)-1
D$snap2 <- mapvalues(as.factor(D$snap2), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap2 <- as.numeric(D$snap2)-1
D$snap3 <- mapvalues(as.factor(D$snap3), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap3 <- as.numeric(D$snap3)-1
D$snap4 <- mapvalues(as.factor(D$snap4), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap4 <- as.numeric(D$snap4)-1
D$snap5 <- mapvalues(as.factor(D$snap5), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap5 <- as.numeric(D$snap5)-1
D$snap6 <- mapvalues(as.factor(D$snap6), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap6 <- as.numeric(D$snap6)-1
D$snap7 <- mapvalues(as.factor(D$snap7), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap7 <- as.numeric(D$snap7)-1
D$snap8 <- mapvalues(as.factor(D$snap8), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap8 <- as.numeric(D$snap8)-1
D$snap9 <- mapvalues(as.factor(D$snap9), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap9 <- as.numeric(D$snap9)-1
D$gender <- mapvalues(as.factor(D$gender), from = c("Girl", "Boy"), to = c("0", "1"))
D$gender <- as.numeric(D$gender)-1
D$grade <- as.numeric(D$grade)
str(D)
'data.frame': 2397 obs. of 12 variables:
$ gender: num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ ave : num 4.67 3.67 4.14 4.11 4.3 ...
headTail(D)
D3 <- D # For later use
# Save D (at early stage) to an .csv file for later analysis in R or MATLAB
write.csv(D, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2.csv",row.names=FALSE)
# For even more simplicity, we rename (and translate) the variables names in the dataset
# without any missing, reducing the predictor categories to be binary,
# i.e. collapsing SNAP values "1" and "2" to "1":
library(plyr)
D <- d.nomiss
D <- rename(D, c(academic_achievement="ave"))
D$ave <- as.numeric(D$ave)
D$snap1 <- mapvalues(as.factor(D$snap1), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap1 <- as.numeric(D$snap1)-1
D$snap2 <- mapvalues(as.factor(D$snap2), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap2 <- as.numeric(D$snap2)-1
D$snap3 <- mapvalues(as.factor(D$snap3), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap3 <- as.numeric(D$snap3)-1
D$snap4 <- mapvalues(as.factor(D$snap4), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap4 <- as.numeric(D$snap4)-1
D$snap5 <- mapvalues(as.factor(D$snap5), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap5 <- as.numeric(D$snap5)-1
D$snap6 <- mapvalues(as.factor(D$snap6), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap6 <- as.numeric(D$snap6)-1
D$snap7 <- mapvalues(as.factor(D$snap7), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap7 <- as.numeric(D$snap7)-1
D$snap8 <- mapvalues(as.factor(D$snap8), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap8 <- as.numeric(D$snap8)-1
D$snap9 <- mapvalues(as.factor(D$snap9), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap9 <- as.numeric(D$snap9)-1
D$gender <- mapvalues(as.factor(D$gender), from = c("Girl", "Boy"), to = c("0", "1"))
D$gender <- as.numeric(D$gender)-1
D$grade <- as.numeric(D$grade)
str(D)
'data.frame': 2397 obs. of 12 variables:
$ gender: num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 0 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 0 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ ave : num 4.67 3.67 4.14 4.11 4.3 ...
headTail(D)
D2 <- D # For later use
# Save the new D to an .csv file without row names for further analysis
write.csv(D, file = "../data/inattention_nomiss_2397x12_snap_is_0_1.csv",row.names=FALSE)
D <- D3
s <- dim(D)
n <- s[1]
p <- s[2]
txt = sprintf("Structure of the %d x %d DATASET", n, p)
print(txt)
[1] "Structure of the 2397 x 12 DATASET"
library(DiagrammeR)
n_txt = sprintf("Dataset \n (N = %d)", n);
gviz <- grViz("
# Circles: predictor variables; Triangle: Outcome variable
digraph Structure_of_the_dataset_D {
# node definitions with substituted label text
node [fontname = Helvetica]
1 [label = 'Dataset \n (N = 2397)', shape=box]
2 [label = 'gender \n {Girl (0) | Boy (1)}', shape=circle]
3 [label = 'grade \n {2 | 3 | 4}', shape=circle]
4 [label = 'ave \n (average marks) \n [1, 6] or {low (L) | medium (M) | high (H)}', shape=triangle]
a [label = 'SNAP \n {0 | 1 | 2}', shape=oval]
b [label = 'SNAP1', shape=circle]
c [label = 'SNAP2', shape=circle]
d [label = 'SNAP3', shape=circle]
e [label = 'SNAP4', shape=circle]
f [label = 'SNAP5', shape=circle]
g [label = 'SNAP6', shape=circle]
h [label = 'SNAP7', shape=circle]
i [label = 'SNAP8', shape=circle]
j [label = 'SNAP9', shape=circle]
# edge definitions with the node IDs
1 -> {2 3 a 4}
a -> {b c d e f g h i j}
}",
engine = "dot")
print(gviz)
NULL
# This does not work using DiagrammeR / GraphViz
# png("../manuscript/Figs/graph_design.png")
# print(gviz)
# dev.off()
# Uses Viewer, Zoom and Screen capture to produce .png and then
# data_prep_structure_grviz_20160203.pdf file
In our analysis we included n = 2397 individuals (none with missing data) from the dataset “/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav”.
D <- D3
n_txt = sprintf("In our analysis we included n = %d individuals (none with missing data) from the dataset '%s'\n", nrow(D), fn);
print(n_txt)
[1] "In our analysis we included n = 2397 individuals (none with missing data) from the dataset '/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav'\n"
We consider the grades (academic_achievement), as both a continuous (for regression) and discretized variable (for classification), where gjennomsnitt: - Item ‘Karaktergjennomsnitt alle gyldige karakterer 1-6 (ikke kroppsøving)’
# Discretized at three levels, with data-driven cutpoints (equifrequent levels)
D <- D3
aver <- D$ave
summary(aver)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 3.556 4.083 4.023 4.556 5.900
bins <- 3
cutpoints<-quantile(aver,(0:bins)/bins,names=FALSE)
print(cutpoints)
[1] 1.000000 3.750000 4.428571 5.900000
# Consistent with MATLAB 'histcounts' (D_20151110_analysis.m ; T2)
# fn2 = '../data/D_20151110.csv';
# T2 = readtable(fn2);
# bins = 3;
# y = quantile(T2.ave,[0:bins]/bins)
# [N,EDGES,BIN] = histcounts(T2.ave,y);
# cuts = sprintf('1:[%.2f, %.2f) 2:[%.2f,%.2f) 3:[%.2f,%.2f]', EDGES(1), EDGES(2), EDGES(2), EDGES(3), EDGES(3), EDGES(4));
# T2.ave_cat = BIN; % categorical(BIN,'Ordinal',true);
# descr = sprintf('%s - 1:low, 2:medium; 3:high average mark', cuts);
# T2.Properties.VariableDescriptions{'ave_cat'} = descr;
# => descr = 1:[1.00, 3.75) 2:[3.75,4.43) 3:[4.43,5.90] - 1:low, 2:medium; 3:high average mark
averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE)
summary(averBinned)
[1,3.75) [3.75,4.43) [4.43,5.9]
779 818 800
Make histogram of dicretized ‘averBinned’:
hist(as.numeric(averBinned))
Define grade categories “low”, “medium” and “high” in terms of the calculated cut-point intervals:
txt_low <- sprintf("low (L): [%.3f, %.3f)\n", cutpoints[[1]], cutpoints[[2]])
print(txt_low)
[1] "low (L): [1.000, 3.750)\n"
txt_medium <- sprintf("medium (M): [%.3f, %.3f)\n", cutpoints[[2]], cutpoints[[3]])
print(txt_medium)
[1] "medium (M): [3.750, 4.429)\n"
txt_high <- sprintf("high H): [%.3f, %.3f]\n", cutpoints[[3]], cutpoints[[4]])
print(txt_high)
[1] "high H): [4.429, 5.900]\n"
library(psych)
# Dataset for classification based on D3 and discretized average academic achievemnt
C <- D3
C$averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE,
labels=c("L","M","H"))
C <- subset(C, select = -c(ave))
str(C)
'data.frame': 2397 obs. of 12 variables:
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ averBinned: Factor w/ 3 levels "L","M","H": 3 1 2 2 2 2 1 2 3 2 ...
headTail(as.data.frame(C))
headTail(as.data.frame(D3))
# Save the dataset C with binary SNAP predictors and trinary outcome to an .csv file
# for further analysis
write.csv(C, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2_outcome_is_L_M_H.csv",row.names=FALSE)
# Dataset for classification based on D3 and discretized average academic achievemnt
E <- D3
E$averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE,
labels=c("0","1","2"))
E <- subset(E, select = -c(ave))
str(E)
'data.frame': 2397 obs. of 12 variables:
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ averBinned: Factor w/ 3 levels "0","1","2": 3 1 2 2 2 2 1 2 3 2 ...
summary(E)
gender grade snap1 snap2 snap3 snap4
Min. :0.000 Min. :2.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :3.000 Median :1.000 Median :1.000 Median :1.000 Median :1.000
Mean :0.524 Mean :2.814 Mean :1.094 Mean :1.075 Mean :1.069 Mean :1.062
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :1.000 Max. :4.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
snap5 snap6 snap7 snap8 snap9 averBinned
Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 0:779
1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1:818
Median :1.00 Median :1.000 Median :1.000 Median :1.000 Median :1.000 2:800
Mean :1.06 Mean :1.058 Mean :1.025 Mean :1.174 Mean :1.084
3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :2.00 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
headTail(as.data.frame(E))
headTail(as.data.frame(D3))
# Save the dataset E with numerical SNAP predictors and trinary outcome to an .csv file
# for further analysis
write.csv(E, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2_outcome_is_0_1_2.csv",row.names=FALSE)
library(xtable)
Attaching package: ‘xtable’
The following object is masked from ‘package:CORElearn’:
display
C <- as.data.frame(C)
# select columns
cols <- c("gender", "grade", "snap1", "snap2", "snap3", "snap4", "snap5", "snap6", "snap7", "snap8", "snap9", "averBinned")
C[,cols] <- data.frame(apply(C[cols], 2, as.factor))
levels(C$gender) <- c("G", "B")
levels(C$grade) <- c("2nd", "3rd", "4th")
# N - not true (0)
# S - somewhat true (1)
# C - certainly true (2)
levels(C$snap1) <- c("N", "S", "C")
levels(C$snap2) <- c("N", "S", "C")
levels(C$snap3) <- c("N", "S", "C")
levels(C$snap4) <- c("N", "S", "C")
levels(C$snap5) <- c("N", "S", "C")
levels(C$snap6) <- c("N", "S", "C")
levels(C$snap7) <- c("N", "S", "C")
levels(C$snap8) <- c("N", "S", "C")
levels(C$snap9) <- c("N", "S", "C")
levels(C$averBinned) <- c("H", "L", "M") # numerical order = alphabetical order
str(C)
'data.frame': 2397 obs. of 12 variables:
$ gender : Factor w/ 2 levels "G","B": 2 2 2 2 2 2 2 2 2 2 ...
$ grade : Factor w/ 3 levels "2nd","3rd","4th": 1 1 1 1 1 1 1 1 1 1 ...
$ snap1 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap2 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap3 : Factor w/ 3 levels "N","S","C": 2 3 2 2 2 2 2 2 2 2 ...
$ snap4 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap5 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap6 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap7 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap8 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 3 2 ...
$ snap9 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ averBinned: Factor w/ 3 levels "H","L","M": 1 2 3 3 3 3 2 3 1 3 ...
headTail(C)
summary(C)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9
G:1141 2nd:1008 N: 46 N: 50 N: 15 N: 16 N: 31 N: 32 N: 12 N: 93 N: 27
B:1256 3rd: 827 S:2079 S:2117 S:2201 S:2217 S:2190 S:2195 S:2312 S:1794 S:2142
4th: 562 C: 272 C: 230 C: 181 C: 164 C: 176 C: 170 C: 73 C: 510 C: 228
averBinned
H:800
L:779
M:818
xtable(summary(C))
% latex table generated in R 3.3.2 by xtable 1.8-2 package
% Sat Dec 10 16:33:42 2016
\begin{table}[ht]
\centering
\begin{tabular}{rllllllllllll}
\hline
& gender & grade & snap1 & snap2 & snap3 & snap4 & snap5 & snap6 & snap7 & snap8 & snap9 & averBinned \\
\hline
1 & G:1141 & 2nd:1008 & N: 46 & N: 50 & N: 15 & N: 16 & N: 31 & N: 32 & N: 12 & N: 93 & N: 27 & H:800 \\
2 & B:1256 & 3rd: 827 & S:2079 & S:2117 & S:2201 & S:2217 & S:2190 & S:2195 & S:2312 & S:1794 & S:2142 & L:779 \\
3 & & 4th: 562 & C: 272 & C: 230 & C: 181 & C: 164 & C: 176 & C: 170 & C: 73 & C: 510 & C: 228 & M:818 \\
\hline
\end{tabular}
\end{table}
# Save the dataset C with SNAP predictors as factors and trinary outcome to an .csv file
# for further analysis
write.csv(C, file = "../data/inattention_nomiss_2397x12_snap_is_N_S_C_outcome_is_L_M_H.csv",row.names=FALSE)
library(Hmisc)
Loading required package: survival
Attaching package: ‘survival’
The following object is masked from ‘package:boot’:
aml
Loading required package: Formula
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:xtable’:
label, label<-
The following objects are masked from ‘package:plyr’:
is.discrete, summarize
The following objects are masked from ‘package:memisc’:
%nin%, html
The following object is masked from ‘package:psych’:
describe
The following object is masked from ‘package:mlr’:
impute
The following object is masked from ‘package:BBmisc’:
%nin%
The following object is masked from ‘package:randomForest’:
combine
The following objects are masked from ‘package:base’:
format.pval, round.POSIXt, trunc.POSIXt, units
describe(C)
C
12 Variables 2397 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct
2397 0 2
Value G B
Frequency 1141 1256
Proportion 0.476 0.524
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
2397 0 3
Value 2nd 3rd 4th
Frequency 1008 827 562
Proportion 0.421 0.345 0.234
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
2397 0 3
Value N S C
Frequency 46 2079 272
Proportion 0.019 0.867 0.113
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
2397 0 3
Value N S C
Frequency 50 2117 230
Proportion 0.021 0.883 0.096
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
2397 0 3
Value N S C
Frequency 15 2201 181
Proportion 0.006 0.918 0.076
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
2397 0 3
Value N S C
Frequency 16 2217 164
Proportion 0.007 0.925 0.068
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
2397 0 3
Value N S C
Frequency 31 2190 176
Proportion 0.013 0.914 0.073
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
2397 0 3
Value N S C
Frequency 32 2195 170
Proportion 0.013 0.916 0.071
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
2397 0 3
Value N S C
Frequency 12 2312 73
Proportion 0.005 0.965 0.030
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
2397 0 3
Value N S C
Frequency 93 1794 510
Proportion 0.039 0.748 0.213
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
2397 0 3
Value N S C
Frequency 27 2142 228
Proportion 0.011 0.894 0.095
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct
2397 0 3
Value H L M
Frequency 800 779 818
Proportion 0.334 0.325 0.341
-----------------------------------------------------------------------------------------------------------
library(pander)
panderOptions("digits", 5)
pander(summary(C))
-------------------------------------------------------------------------
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7
-------- -------- ------- ------- ------- ------- ------- ------- -------
G:1141 2nd:1008 N: 46 N: 50 N: 15 N: 16 N: 31 N: 32 N: 12
B:1256 3rd: 827 S:2079 S:2117 S:2201 S:2217 S:2190 S:2195 S:2312
NA 4th: 562 C: 272 C: 230 C: 181 C: 164 C: 176 C: 170 C: 73
-------------------------------------------------------------------------
Table: Table continues below
----------------------------
snap8 snap9 averBinned
------- ------- ------------
N: 93 N: 27 H:800
S:1794 S:2142 L:779
C: 510 C: 228 M:818
----------------------------
pander(summary(E))
---------------------------------------------------------------------
gender grade snap1 snap2 snap3
------------- ------------- ------------- ------------- -------------
Min. :0.000 Min. :2.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :3.000 Median :1.000 Median :1.000 Median :1.000
Mean :0.524 Mean :2.814 Mean :1.094 Mean :1.075 Mean :1.069
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :1.000 Max. :4.000 Max. :2.000 Max. :2.000 Max. :2.000
---------------------------------------------------------------------
Table: Table continues below
--------------------------------------------------------------------
snap4 snap5 snap6 snap7 snap8
------------- ------------ ------------- ------------- -------------
Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:1.000 1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :1.00 Median :1.000 Median :1.000 Median :1.000
Mean :1.062 Mean :1.06 Mean :1.058 Mean :1.025 Mean :1.174
3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :2.000 Max. :2.00 Max. :2.000 Max. :2.000 Max. :2.000
--------------------------------------------------------------------
Table: Table continues below
--------------------------
snap9 averBinned
------------- ------------
Min. :0.000 0:779
1st Qu.:1.000 1:818
Median :1.000 2:800
Mean :1.084 NA
3rd Qu.:1.000 NA
Max. :2.000 NA
--------------------------
Describe subsets of data according to academic achievement and gender
C.girls.L <- C[ which(C$gender=='G' & C$averBinned=='L'), ]
C.girls.H <- C[ which(C$gender=='G' & C$averBinned=='H'), ]
C.boys.L <- C[ which(C$gender=='B' & C$averBinned=='L'), ]
C.boys.H <- C[ which(C$gender=='B' & C$averBinned=='H'), ]
summary(C.girls.L)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G:447 2nd:183 N: 23 N: 33 N: 9 N: 9 N: 17 N: 16 N: 5 N: 48 N: 14 H: 0
B: 0 3rd:163 S:328 S:313 S:361 S:365 S:353 S:351 S:406 S:226 S:355 L:447
4th:101 C: 96 C:101 C: 77 C: 73 C: 77 C: 80 C: 36 C:173 C: 78 M: 0
summary(C.girls.H)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G:305 2nd:118 N: 1 N: 3 N: 0 N: 0 N: 5 N: 2 N: 0 N: 7 N: 2 H:305
B: 0 3rd:105 S:282 S:286 S:284 S:289 S:284 S:289 S:300 S:245 S:281 L: 0
4th: 82 C: 22 C: 16 C: 21 C: 16 C: 16 C: 14 C: 5 C: 53 C: 22 M: 0
summary(C.boys.L)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G: 0 2nd:128 N: 6 N: 3 N: 1 N: 0 N: 3 N: 3 N: 3 N: 12 N: 2 H: 0
B:332 3rd:100 S:284 S:284 S:311 S:304 S:303 S:299 S:321 S:245 S:295 L:332
4th:104 C: 42 C: 45 C: 20 C: 28 C: 26 C: 30 C: 8 C: 75 C: 35 M: 0
summary(C.boys.H)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G: 0 2nd:232 N: 2 N: 0 N: 0 N: 0 N: 0 N: 0 N: 0 N: 1 N: 1 H:495
B:495 3rd:146 S:474 S:488 S:486 S:491 S:489 S:493 S:493 S:450 S:476 L: 0
4th:117 C: 19 C: 7 C: 9 C: 4 C: 6 C: 2 C: 2 C: 44 C: 18 M: 0
library(Hmisc)
describe(C.girls.L)
C.girls.L
12 Variables 447 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
447 0 1 G
Value G
Frequency 447
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
447 0 3
Value 2nd 3rd 4th
Frequency 183 163 101
Proportion 0.409 0.365 0.226
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
447 0 3
Value N S C
Frequency 23 328 96
Proportion 0.051 0.734 0.215
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
447 0 3
Value N S C
Frequency 33 313 101
Proportion 0.074 0.700 0.226
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
447 0 3
Value N S C
Frequency 9 361 77
Proportion 0.020 0.808 0.172
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
447 0 3
Value N S C
Frequency 9 365 73
Proportion 0.020 0.817 0.163
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
447 0 3
Value N S C
Frequency 17 353 77
Proportion 0.038 0.790 0.172
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
447 0 3
Value N S C
Frequency 16 351 80
Proportion 0.036 0.785 0.179
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
447 0 3
Value N S C
Frequency 5 406 36
Proportion 0.011 0.908 0.081
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
447 0 3
Value N S C
Frequency 48 226 173
Proportion 0.107 0.506 0.387
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
447 0 3
Value N S C
Frequency 14 355 78
Proportion 0.031 0.794 0.174
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
447 0 1 L
Value L
Frequency 447
Proportion 1
-----------------------------------------------------------------------------------------------------------
describe(C.girls.H)
C.girls.H
12 Variables 305 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
305 0 1 G
Value G
Frequency 305
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
305 0 3
Value 2nd 3rd 4th
Frequency 118 105 82
Proportion 0.387 0.344 0.269
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
305 0 3
Value N S C
Frequency 1 282 22
Proportion 0.003 0.925 0.072
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
305 0 3
Value N S C
Frequency 3 286 16
Proportion 0.010 0.938 0.052
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
305 0 2
Value S C
Frequency 284 21
Proportion 0.931 0.069
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
305 0 2
Value S C
Frequency 289 16
Proportion 0.948 0.052
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
305 0 3
Value N S C
Frequency 5 284 16
Proportion 0.016 0.931 0.052
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
305 0 3
Value N S C
Frequency 2 289 14
Proportion 0.007 0.948 0.046
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
305 0 2
Value S C
Frequency 300 5
Proportion 0.984 0.016
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
305 0 3
Value N S C
Frequency 7 245 53
Proportion 0.023 0.803 0.174
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
305 0 3
Value N S C
Frequency 2 281 22
Proportion 0.007 0.921 0.072
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
305 0 1 H
Value H
Frequency 305
Proportion 1
-----------------------------------------------------------------------------------------------------------
describe(C.boys.L)
C.boys.L
12 Variables 332 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
332 0 1 B
Value B
Frequency 332
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
332 0 3
Value 2nd 3rd 4th
Frequency 128 100 104
Proportion 0.386 0.301 0.313
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
332 0 3
Value N S C
Frequency 6 284 42
Proportion 0.018 0.855 0.127
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
332 0 3
Value N S C
Frequency 3 284 45
Proportion 0.009 0.855 0.136
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
332 0 3
Value N S C
Frequency 1 311 20
Proportion 0.003 0.937 0.060
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
332 0 2
Value S C
Frequency 304 28
Proportion 0.916 0.084
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
332 0 3
Value N S C
Frequency 3 303 26
Proportion 0.009 0.913 0.078
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
332 0 3
Value N S C
Frequency 3 299 30
Proportion 0.009 0.901 0.090
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
332 0 3
Value N S C
Frequency 3 321 8
Proportion 0.009 0.967 0.024
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
332 0 3
Value N S C
Frequency 12 245 75
Proportion 0.036 0.738 0.226
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
332 0 3
Value N S C
Frequency 2 295 35
Proportion 0.006 0.889 0.105
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
332 0 1 L
Value L
Frequency 332
Proportion 1
-----------------------------------------------------------------------------------------------------------
describe(C.boys.H)
C.boys.H
12 Variables 495 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
495 0 1 B
Value B
Frequency 495
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
495 0 3
Value 2nd 3rd 4th
Frequency 232 146 117
Proportion 0.469 0.295 0.236
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
495 0 3
Value N S C
Frequency 2 474 19
Proportion 0.004 0.958 0.038
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
495 0 2
Value S C
Frequency 488 7
Proportion 0.986 0.014
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
495 0 2
Value S C
Frequency 486 9
Proportion 0.982 0.018
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
495 0 2
Value S C
Frequency 491 4
Proportion 0.992 0.008
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
495 0 2
Value S C
Frequency 489 6
Proportion 0.988 0.012
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
495 0 2
Value S C
Frequency 493 2
Proportion 0.996 0.004
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
495 0 2
Value S C
Frequency 493 2
Proportion 0.996 0.004
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
495 0 3
Value N S C
Frequency 1 450 44
Proportion 0.002 0.909 0.089
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
495 0 3
Value N S C
Frequency 1 476 18
Proportion 0.002 0.962 0.036
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
495 0 1 H
Value H
Frequency 495
Proportion 1
-----------------------------------------------------------------------------------------------------------